-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a function to fill NaNs in grids by interpolation #440
base: main
Are you sure you want to change the base?
Conversation
💖 Thank you for opening your first pull request in this repository! 💖 A few things to keep in mind:
⭐ No matter what, we are really grateful that you put in the effort to do this! ⭐ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Phssilva thank you for the implementation! I left some comments in the code with changes that are needed.
I really appreciate you taking the time to do this!
verde/tests/test_utils.py
Outdated
def test_fill_nans(): | ||
""" | ||
This function tests the fill_nans function. | ||
""" | ||
|
||
grid = np.array([[1, np.nan, 3], | ||
[4, 5, np.nan], | ||
[np.nan, 7, 8]]) | ||
filled_grid = fill_nans(grid) | ||
assert np.isnan(filled_grid).sum() == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is supposed to take an xarray.DataArray
but the test gives it a numpy array. The test should match the expected use of the function. Please make the grid into a DataArray
.
It would also be good to check if the final grid has the correct values in the NaNs. Right now, this only checks that the NaNs aren't there but the values could be completely wrong and we'd never know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Leo!
A DataArray was added to the test function, and a check was made on the filled values in the DataArray. Could you please verify if the changes made are correct? Thank you for your attention.
verde/utils.py
Outdated
@@ -14,6 +14,7 @@ | |||
import pandas as pd | |||
import xarray as xr | |||
from scipy.spatial import cKDTree | |||
from sklearn.impute import KNNImputer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use verde.KNeighbors
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the KNeighbors class and used the predict and fit methods to fill in the values of the data array.
verde/utils.py
Outdated
|
||
Parameters | ||
---------- | ||
grid : :class:`xarray.Dataset` or :class:`xarray.DataArray` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be only a DataArray
and not a Dataset
.
verde/utils.py
Outdated
for i, idx in enumerate(unknown_indices): | ||
grid[tuple(idx)] = predicted_values[i] | ||
|
||
return grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output grid should be a copy of the input grid. We want to avoid changing the input values in-place. The code above will actually alter the input and could cause problems for users since their original grid with NaNs is now gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a copy of the grid in the variable filled_grid, which is returned at the end of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing the work @Phssilva! I left some more comments to get this on its way.
expected_values = xr.DataArray([[1, 1, 3], | ||
[4, 5, 3], | ||
[4, 7, 8]]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataArray
s should contain coordinates as well. They should be as close as possible to the format of a real dataset. That's how we make the tests more robust.
unknown_indices = np.argwhere(np.isnan(grid.values)) | ||
|
||
knn_imputer = vd.KNeighbors() | ||
easting, northing = not_nan_values[:, 0], not_nan_values[:, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the actual coordinates of the grid instead of generating indices. This makes the interpolation work even if the grid is not uniform.
not_nan_values = np.argwhere(~np.isnan(grid.values)) | ||
unknown_indices = np.argwhere(np.isnan(grid.values)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the verde.grid_to_table
function and then drop the NaNs from it. It's easier and preserves the coordinates of the grid.
predicted_values = knn_imputer.predict((easting, northing)) | ||
|
||
for i, idx in enumerate(unknown_indices): | ||
filled_grid[tuple(idx)] = predicted_values[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, you could use knn.grid
and pass in the coordinates of the original grid.
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
It has been added the function that fills NaN data in a grid and a test has been performed for this function.
Please review, and I'm available for further revisions.
Relevant issues/PRs: